Business information extraction from semi-structured webpages

نویسندگان

  • Nahk Hyun Sung
  • YongSik Chang
چکیده

To protect online consumers, as OECD Guidelines recommend, Internet shopping malls should provide information about their business on their webpages. In Korea, The Consumer Protection Law in Electronic Commerce, forced Internet shopping malls to provide their business information, so that consumers could easily identify them. Since most Korean Internet shopping malls provide consumers with business information in a semi-structured format on their homepages, a software agent can easily identify them. To investigate automatically the provision of the business information with the Internet shopping malls, this article proposes the methods of gathering URLs of Internet shopping malls, of monitoring alterations of webpages, and of extracting business information. Business information extraction in our research is based on synonyms and indicator words of the attributes. We used inductive learning to raise the efficiency of information extraction. With experiments, we showed the potentialities of our agent system. The average extraction accuracy of our agent system was 89.3%. q 2004 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web Entities Extraction Based on Semi-Structured Semantic Database

Web is the biggest source of information and contains many entities and relationships between them, extracting these data from Massive Web pages and Integrating to a Semi-Structured Data with rich semantics will be more conducive to the management and use of these web data. On this premise, a comprehensive method is proposed to perform extraction the entities and relationships from the webpages...

متن کامل

Ontology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis

The Market Blended Insight project has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the un...

متن کامل

Automatic Record Extraction for the World Wide Web

As the amount of information on the World Wide Web grows, there is an increasing demand for software that can automatically process and extract information from web pages. Despite the fact that the underlying data on most web pages is structured, we cannot automatically process these web sites/pages as structured data. We need robust technologies that can automatically understand human-readable...

متن کامل

Web Data Extraction for Business Intelligence: The Lixto Approach

Knowledge about market developments and competitor activities on the market becomes more and more a critical success factor for enterprises. The World Wide Web provides public domain information which can be retrieved for example from Web sites or online shops. The extraction from semi-structured information sources is mostly done manually and is therefore very time consuming. This paper descri...

متن کامل

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2004